Please indicate:

Question 1:

Consider the breast cancer from HW02. Plot, using the ww shapefiles provided in Lecture 20, three maps:

Notes:

Breast Cancer Incidence Rate

It’s hard to see exactly what’s going on in the smaller census tracts, so let’s try zooming in.

Both plots indicate that the incident rate of breast cancer is somewhere betweeen 0.5% to 2.5%, though we do see one outlier with a rate of about 4%. Additionally, there doesn’t seem to be any clear pattern between rural areas and the more metropolitan areas around Seattle and Tacoma.

Median Income

And another quick zoom to make sure we don’t lose the little guys.

Interestingly, the image shows that most of the wealthiest areas are just outside of the cities, not actually in the cities themselves.

Breast Cancer and Median Income

We will make our metric the product of the incident cancer rate and the median household income. The higher values will be census tracts in which either the cancer rate is higher than usual, and/or the median income is larger than usual.

Our image shows brighter spots in the wealthier areas, suggesting that wealthier communities tend to have higher rates of breast cancer.

It’s also worth taking a look at a plot of the two variables

## Warning: Removed 711 rows containing non-finite values (stat_boxplot).

There does appear to be a slightly increasing trend, indicating that as the median household income increases, so does the incident rate of breast cancer. We can find the difference using an ANOVA test. We can set up a quick hypothesis test:

\(H_0\): There is no increase between cancer rate and income quantiles.

\(H_A\): There is an increase between cancer rate and income quantiles.

trend <- aov(incidence ~ factor(income_quantile), data = census)
summary(trend)
##                          Df   Sum Sq   Mean Sq F value   Pr(>F)    
## factor(income_quantile)   4 0.000203 5.087e-05   6.971 1.59e-05 ***
## Residuals               881 0.006430 7.300e-06                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 711 observations deleted due to missingness

With a p-val less than 0.05, we reject \(H_0\) in favor of \(H_A\). This means our data suggests that there is a statistically significant difference for the cancer rate, just a few fractions of a percent, between each income quantile.

Question 2:

Download the results of the 2000 election from the School of Public Affairs at American University in DC and create two maps involving only the lower 48 states that show:

where

Then answer the following questions:

  1. Comment on the biggest differences when changing from “census tract” resolution to “state” resolution.
  2. Comment on how the maps align with the idea of the Nine Nations of North America
  3. Which states exhibit the greatest within state heterogeneity in voting? Come up with a mathematical justification.

Notes:

# This function eliminates all non-alphanumeric characters and spaces and 
# converts all text to lower case.
clean.text <- function(text){
  text <- gsub("[^[:alnum:]]", "", text)
  text <- gsub(" ", "", text)
  text <- tolower(text)
  return(text)
}

# State and county map of US in 2010
US.state <- map_data("state") %>% tbl_df()
US.county <- map_data("county") %>% tbl_df()
US.county$subregion <- clean.text(US.county$subregion)

ggplot(US.county, aes(x=long, y=lat, group=group)) +
  geom_polygon(fill="white") +
  geom_path(col="black", size=0.01) +
  coord_map()

State

County

Question 3:

The Chief of the Portland Police is tired of reading through pages of crime reports and wants an interactive tool to visualize where different crimes occured during the years 2004 and 2013. Obtain crime data for Portland for years 2004 through 2013 from the CivicApps site, create (in a separate .Rmd file) an appropriate Shiny app, and publish it online. Post the hyperlink here.

Your Shiny app should take in two inputs. Think carefully which is the best way to have users input these:

Using this app, answer the following questions: